Process Corner

Coffee is Cold

New services must be tamed!

First, let me say that this article is strictly from a technician's point of view. Thus if you recognize a somewhat biased attitude toward the role of a technician who has to support a lot of different services and the underlying applications, I plead guilty!

In my three years in UTS, I have found instances where a service seems to have started as a "let's try this great idea" type of thing, only to evolve into something that users depend on and expect to be readily available (I call this user-creep). In my opinion, three such examples are JabbR, the confluence wiki and subversion (none of which I believe appear in our service catalog).

I am familiar with ITIL Operational Support and Analysis and Release Control and Validation (I even have the ITIL certification certificates to prove it). However, I believe that there needs to be additional criteria that a service meets prior to that service being released and made available to users.

So, here are MY five rules/questions for releasing a new service (and creating happy, or at least less unhappy, technicians):

Rule 1: Is the service fault-tolerant (clustered) and/or load-balanced? If not, does the service at least have a hot standby? (Extra credit: what’s the difference?) If neither of these are true, are users and customers made aware that patching or upgrading the service could (and probably will) lead to extended periods of downtime? Without some redundancy, it is impossible for technicians to patch or upgrade the underlying application without extended service outages which leads to both frustration on the technician's part and unhappy users and customers.

Rule 2: At least initially, are there technicians in place who understand and can support the service? (NOTE: I said "technicians," NOT technician.) A single technician is nothing more than a single point of failure (and thus #1 is already broken).

Rule 3: Is there a test environment that accurately reflects the PROD environment? Without this, it is impossible to test patches and upgrades before applying to the production service. This causes a great deal of uncertainty for the technician who has to apply the upgrade or patch and again possible extended periods of service outages. If such an environment exits, I freely admit that it is the technician's responsibility to ensure that the environment is kept up to date.

Rule 4: If the service uses a database or other means of storing data, is that data backed up? The pitfalls here are obvious and could be catastrophic for the user, the organization and the reputation of UTS (and the job security of the technician). Additionally, this becomes critically more important if #1 does not exist or is broken.

Rule 5: Special customizations and/or configurations are BAD!!! Ever heard of the KISS principle? If the underlying application that provides a service uses any special customizations or configurations, are those customizations and configurations thoroughly documented? Are the reasons for those customizations and configurations thoroughly explained? Why would we want to use an application that requires a high degree of customization or configuration or that runs in such a manner that the application vendors cannot or are unwilling to provide support for it? As part of the service installation, there should be a requirement to provide documentation that thoroughly explains that installation and why ANY customization was necessary.

Without the documentation, and perhaps even with the documentation, upgrading or patching that application can easily become a nightmare. We have to avoid what I call the "smart-guy syndrome." This is where you have an individual who is really gifted and who takes an application and cuts, pokes and prods to make that service and the underlying application work. Unfortunately, that application now runs in a manner that the vendor never intended or perhaps even envisioned.

The pitfall here is that the smart-guy leaves the organization and dumb-guys like me have to support the service (a simple test; install and document the installation and then have another technician not on your team re-install the service. A simpler test: is there an O'Reilly book that documents this configuration?). Also, because the application has such a high degree of customization/configuration (and sometimes even a low degree), vendor support is effectively broken.

I am sure that all of you have seen the message that is often posted during a service outage. It typically goes something like this: "Service <insert name here> is unavailable. Technicians are frantically working to restore the service."

Obviously, I inserted the word "frantically" but that does provide a better picture of what is actually going on. I would hypothesize that if my five rules were taken into account we would see much fewer of these messages and when problems arise, the service will be restored much quicker. As service technicians, we often have to deal with problems that inevitably occur late on a Friday evening (the Friday Curse), before a holiday or at 2 a.m. in the morning and we are expected to always provide the "silver bullet" fix for the problem (a little known fact; our gun is sometimes empty).

I was in the Army for twenty years and three days, and I can tell you from experience, that two of the most underappreciated jobs are the cook and the mechanic (and yet I have seen commanders argue quite aggressively to get that good cook or knowledgeable mechanic assigned to their unit). Both have long, untraditional hours and rarely get noticed unless the coffee is cold or the vehicle is broken. I find this somewhat ironic in that the Army would soon come to a screeching halt without these basic functions; similarly, I would suggest that the inglorious technicians are the glue that keeps UTS services and the underlying applications humming along. Although the actual job of a service technician and that of an Army cook or mechanic could not be more dissimilar, I do believe that from a customer or user perception, all three are very similar.

For us, "the coffee is cold."

- Gerry Hall, Application Developer, Integration